AITopics | contextual bandit

We study contextual bandits in the stochastic i.i.d.\ setting, where a learner observes contexts drawn from an unknown distribution, selects actions from a finite set $A$, and aims to identify an approximately optimal policy from a given class based on bandit feedback. Motivated by bandit multiclass classification with zero-one rewards, we focus on the \emph{$s$-sparse} setting in which, for every context, the reward vector has $L_1$-norm at most $s \ll |A|$. Our main result is the design of algorithms that, with high probability, output an $ε$-optimal policy compared to policy class $Π$ using $\tilde{O} ((s/ε^2 + |A|/ε)\log |Π|/δ)$ samples. We extend this bound to general Natarajan classes and complement it with a matching lower bound (up to logarithmic factors), thereby closing a substantial gap left by prior work (Erez et al., 2024, 2025), which incurred an additional $Θ(|A|^9)$ dependence. We obtain these results via two complementary approaches. First, we analyze contextual bandits through the lens of contextual decision making with structured observations, designing an exploration-by-optimization algorithm whose sample complexity is governed by the \emph{decision-estimation coefficient} (DEC; Foster et al., 2021, 2022). We show that, with $s$-sparse rewards, the induced model class admits a sharp DEC bound that scales with $s$ and directly yields the optimal rate. Since this approach is largely information-theoretic and involves solving complex min-max optimization problems, we also develop a second, more specialized algorithmic method based on a low-variance exploration technique. This approach leads to concrete, tractable algorithms and naturally extends to contextual combinatorial semi-bandits, leading to improved sample complexity guarantees for bandit multiclass list classification.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2605.29645

Country: Asia > Middle East > Israel (0.28)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

Robust Sequential Experimental Design for A/B Testing

Wen, Qianglin, Wu, Xiangkun, Shi, Chengchun, Li, Ting, Tang, Niansheng, Zhang, Yingying, Zhu, Hongtu

arXiv.org Machine LearningMay-14-2026

Experimental design has emerged as a powerful approach for improving the sample efficiency of A/B testing, yet existing designs rely critically on correctly specified models. We study robust sequential experimental design under model misspecification and develop a unified framework that covers both contextual bandit and dynamic settings. Theoretically, we prove that our design bounds the worst-case mean squared error of the estimated treatment effect. Empirically, we demonstrate the effectiveness of the proposed approach using synthetic and real-world datasets from a leading technology company.

artificial intelligence, machine learning, robust sequential experimental design, (16 more...)

arXiv.org Machine Learning

2605.12899

Country:

Asia > China (0.46)
North America > United States (0.45)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.68)
Transportation (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)
(2 more...)

Add feedback

d54e440c92affd396117e161bbab5e78-Paper-Conference.pdf

Neural Information Processing SystemsMay-1-2026, 04:47:33 GMT

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

cba76ef96c4cd625631ab4d33285b045-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 19:04:43 GMT

data mining, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.29)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

c4e380fb74dec9da9c7212e834657aa9-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 17:03:35 GMT

artificial intelligence, communication cost, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Noise-Adaptive Thompson Sampling for Linear Contextual Bandits

Neural Information Processing SystemsApr-27-2026, 01:58:01 GMT

Linear contextual bandits represent a fundamental class of models with numerous real-world applications, and it is critical to developing algorithms that can effectively manage noise with unknown variance, ensuring provable guarantees for both worst-case constant-variance noise and deterministic reward scenarios.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

2567c95fd41459a98a73ba893775d22a-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-26-2026, 00:07:25 GMT

machine learning, natural language, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

2567c95fd41459a98a73ba893775d22a-Paper-Conference.pdf

Neural Information Processing SystemsApr-26-2026, 00:07:22 GMT

machine learning, natural language, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Experiment Planning with Function Approximation

Neural Information Processing SystemsApr-25-2026, 16:03:22 GMT

We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms--for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies--producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied [53], results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension [42] of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Experiment Planning with Function Approximation

Neural Information Processing SystemsApr-25-2026, 16:03:18 GMT

We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms--for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies--producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied [53], results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension [42] of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection.

data mining, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

Filters

Collaborating Authors

contextual bandit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

The Sample Complexity of Multiclass and Sparse Contextual Bandits

Robust Sequential Experimental Design for A/B Testing

d54e440c92affd396117e161bbab5e78-Paper-Conference.pdf

cba76ef96c4cd625631ab4d33285b045-Paper-Conference.pdf

c4e380fb74dec9da9c7212e834657aa9-Paper-Conference.pdf

Noise-Adaptive Thompson Sampling for Linear Contextual Bandits

2567c95fd41459a98a73ba893775d22a-Supplemental-Conference.pdf

2567c95fd41459a98a73ba893775d22a-Paper-Conference.pdf

Experiment Planning with Function Approximation

Experiment Planning with Function Approximation